fix: disallow uppercase values in GitLab publishers #18805

miketheman · 2025-10-06T17:53:42Z

The API and claims generated from GitLab will be desensitized when
generating JWTs.

Instead of updating the regular expression and using the same generic
error message, create a custom validator with a custom message.

Resolves #18797

Signed-off-by: Mike Fiedler [email protected]

The API and claims generated from GitLab will be desensitized when generating JWTs. Instead of updating the regular expression and using the same generic error message, create a custom validator with a custom message. Resolves pypi#18797 Signed-off-by: Mike Fiedler <[email protected]>

Signed-off-by: Mike Fiedler <[email protected]>

miketheman · 2025-10-06T17:56:34Z

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

miketheman · 2025-10-06T18:01:10Z

Looks like:

Signed-off-by: Mike Fiedler <[email protected]>

di · 2025-10-06T18:28:26Z

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

miketheman · 2025-10-06T18:53:37Z

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

I previously wrote:

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

Expanding on this - if we do lowercase at query-time, our machinery would still provide the end user a "publisher not found" message when incorrect, and then kick off the questions of "what's the ground truth configured, what did we get in the claims, etc". Whereas prompting them to input the value that we will expect to find later at the outset removes a set of concerns, especially as I look at other behaviors related to validation.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

Do you see an inherent problem with this approach?

di · 2025-10-06T19:49:21Z

Yeah, I think the issue is that if the user puts in caseSensitiveProject (because that's what they see displayed on GitLab and we reject it because we're being case-sensitive and overly restrictive, that's somewhat confusing:

If the user knows and expects GitLab to be case-insensitive, it's confusing because it's the opposite of what they know.
If the user doesn't know that GitLab is case-insensitive, it's confusing because it seems like we're telling them to put in something other than what their repo is named.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

I agree, we should not mutate what we've been originally given by the user before storing it.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

I disagree, I don't think this is worth making users jump through extra hoops here just to do normalization for us.

I think either way when doing a manual query, we'll need to remember that there might be a difference between the repo name a user provides us and what's in the database (e.g. if they come to us with a repo named https://gitlab.com/coBib/cobib, we'll have to remember that this will be stored as cobib and not coBib in the database.

Additionally, this doesn't help any of the users that have a mis-configured publisher currently, whereas updating the query to do normalization would make their publishers work immediately:

warehouse=> SELECT count(*) FROM gitlab_oidc_publishers WHERE namespace ~ '[A-Z]' OR project ~ '[A-Z]';
 count
-------
    66
(1 row)

ewdurbin · 2025-10-07T14:55:09Z

I'm in favor of us accepting whatever the user gives, more or less full-stop. this is a recipe for frustration and errors, not dissimilar from a banking app refusing to let me paste in an account number from my password manager and forcing me to type it in piece by piece and opening the door to typos.

Example: a user wants to paste in PyPI/Receiver and is refused they may very well type in pypi/reciever and introduce a typo (my favorite typo in fact).

woodruffw · 2025-10-07T15:24:57Z

Yeah, I think PyPI should probably be permissive of casing decisions made by the user at TP enrollment/provisioning time -- it's frustrating that CI/CD services don't tend to document these details super well, but IMO the user experience is a lot better if PyPI behaves like GitHub and GitLab (which are case-insensitive-but-preserving AFAICT in this context) and allows matching regardless of how the user originally cased their inputs.

In other words, IMO we should allow uppercase values here and accept a small amount of additional complication at lookup/matching time.

(Relatedly, I think it was a mistake on my part to do a "normalization" trick for GitHub where we replace the user's given username with the one the GitHub API responds with -- similarly to here, we probably should just take whatever the user puts in as long as it's valid and honor their case choices.)

Ref:

warehouse/warehouse/oidc/forms/github.py

Lines 128 to 130 in 8b77163

    
           # NOTE: Use the normalized owner name as provided by GitHub. 
        
           self.normalized_owner = owner_info["login"] 
        
           self.owner_id = owner_info["id"]

miketheman added 2 commits October 6, 2025 13:53

make translations

ba889fd

Signed-off-by: Mike Fiedler <[email protected]>

miketheman requested a review from a team as a code owner October 6, 2025 17:53

miketheman added UX/UI design, user experience, user interface trusted-publishing labels Oct 6, 2025

docs: add a callout on lowercase

50cc6e4

Signed-off-by: Mike Fiedler <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: disallow uppercase values in GitLab publishers #18805

fix: disallow uppercase values in GitLab publishers #18805

Uh oh!

miketheman commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025

Uh oh!

di commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025 •

edited

Loading

Uh oh!

di commented Oct 6, 2025

Uh oh!

ewdurbin commented Oct 7, 2025

Uh oh!

woodruffw commented Oct 7, 2025

Uh oh!

Uh oh!

fix: disallow uppercase values in GitLab publishers #18805

Are you sure you want to change the base?

fix: disallow uppercase values in GitLab publishers #18805

Uh oh!

Conversation

miketheman commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025

Uh oh!

di commented Oct 6, 2025

Uh oh!

miketheman commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

di commented Oct 6, 2025

Uh oh!

ewdurbin commented Oct 7, 2025

Uh oh!

woodruffw commented Oct 7, 2025

Uh oh!

Uh oh!

miketheman commented Oct 6, 2025 •

edited

Loading